A Distributed Representation-Based Framework for Cross-Lingual Transfer Parsing
نویسندگان
چکیده
This paper investigates the problem of cross-lingual transfer parsing, aiming at inducing dependency parsers for low-resource languages while using only training data from a resource-rich language (e.g., English). Existing model transfer approaches typically don’t include lexical features, which are not transferable across languages. In this paper, we bridge the lexical feature gap by using distributed feature representations and their composition. We provide two algorithms for inducing cross-lingual distributed representations of words, which map vocabularies from two different languages into a common vector space. Consequently, both lexical features and non-lexical features can be used in our model for cross-lingual transfer. Furthermore, our framework is flexible enough to incorporate additional useful features such as cross-lingual word clusters. Our combined contributions achieve an average relative error reduction of 10.9% in labeled attachment score as compared with the delexicalized parser, trained on English universal treebank and transferred to three other languages. It also significantly outperforms state-of-the-art delexicalized models augmented with projected cluster features on identical data. Finally, we demonstrate that our models can be further boosted with minimal supervision (e.g., 100 annotated sentences) from target languages, which is of great significance for practical usage.
منابع مشابه
A Representation Learning Framework for Multi-Source Transfer Parsing
Cross-lingual model transfer has been a promising approach for inducing dependency parsers for lowresource languages where annotated treebanks are not available. The major obstacles for the model transfer approach are two-fold: 1. Lexical features are not directly transferable across languages; 2. Target languagespecific syntactic structures are difficult to be recovered. To address these two c...
متن کاملCross-lingual Dependency Parsing Based on Distributed Representations
This paper investigates the problem of cross-lingual dependency parsing, aiming at inducing dependency parsers for low-resource languages while using only training data from a resource-rich language (e.g. English). Existing approaches typically don’t include lexical features, which are not transferable across languages. In this paper, we bridge the lexical feature gap by using distributed featu...
متن کاملDistributed Word Representation Learning for Cross-Lingual Dependency Parsing
This paper proposes to learn languageindependent word representations to address cross-lingual dependency parsing, which aims to predict the dependency parsing trees for sentences in the target language by training a dependency parser with labeled sentences from a source language. We first combine all sentences from both languages to induce real-valued distributed representation of words under ...
متن کاملAnnotation Projection-based Representation Learning for Cross-lingual Dependency Parsing
Cross-lingual dependency parsing aims to train a dependency parser for an annotation-scarce target language by exploiting annotated training data from an annotation-rich source language, which is of great importance in the field of natural language processing. In this paper, we propose to address cross-lingual dependency parsing by inducing latent crosslingual data representations via matrix co...
متن کاملCross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding. We present an end-to-end graph-based neural network dependency parser that can be trained to reproduce matrices of edge scores, which can be directly projected across word alignments. We show that our approach to cross-lingual dependency parsing is not only simpler, but also a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Artif. Intell. Res.
دوره 55 شماره
صفحات -
تاریخ انتشار 2016